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ABSTRACT 

Genetic changes underlie tumor progression and may lead to cancer- 
specific expression of critical genes. Over 1100 publications have de-. 
scribed the use of comparative genomic hybridization (CGH) to analyze 
the pattern of copy number alterations In cancer, but very few of the genes 
affected are known. Here, we performed hlgh-resolotioh CGH analysis on 
cDNA microarrays in breast cancer and directly compared copy number 
and mRNA expression levels of 13324 genes to quantitate the impact of 
genomic changes on gene expression* We identified and mapped the 
boundaries of 24 independent amplicons, ranging in size from 0.2 to 12 
Mb« Throughout the genome, both high- and low-level copy number 
changes had a substantial impact on gene expression, with 44% of the 
highly aroplilied genes showing overexpression and. 10.5% of the highly 
overexpressed genes being amplifled. Statistical analysis with random 
permutation tests identified 270 genes whose expression levels across 14 
samples were systematically attributable to gene, amplification. T^ese 
included most prevlously described amplified genes, in breast cancer and 
many novel targets for genomic' alterations, including the ffOXBT gene, 
, the presence of which In a novd amplicon at 17q213 wais validated In 
10J% of primary breast cancers and associated with ppor patient prog- 
nosis. In conclusion,' CGH on cDNA microarrays revealed hundreds of 
novel genes whose overexpression Is attributable to gene amplification. 
These genes may provide insights to tbe clonal evolution and progression 
of breast cancer and highlight promising therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microanays have 
facilitated classification of cancers into biologically distinct catego- 
ries, some of which may explain the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanics iindqrlying gene expression patterns in cancer have re- 
mainect.ehisive, and the utility of gene iexpression profiling in the 
identification of specific tfierapeutic targets remains limitedr^ ^ 

Accmnulation of genetic defects is thought to underlie the clonal 
evolution of cancer. Identification of the genes that mediate flie effects 
of genetic changes may be important by highlighting transcripts that 
are actively involved in tumor progression. Such transcripts and tiieir 
encoded proteins would be ideal targets , for anticancer th^^ies, as 
demonstrated by the clinical success of new tiierapies against ampli- 
fied oncogenes, such as ERBB2 md EGFRP, 8), in breast cancer and 
other solid tumors. Besides amplifications of known oncogenes, over 
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Fig. 1 . Intact of gene copy numba on global gene-expression levels. A, percentage of 
over- and und ei CApi e s sed-^enes {Y axis) according to copy . nuroberMatios QC axis). 
Threshold values used for over- and underexpressfon were >2.184 (global upper 7% of 
the cDNA ratios) and <0.4826 (global lower 7% of tiie expressum ratios). B, percentage 
of amplified and deleted genes according to expressicm ratios. Tfareshotd vahies for 
anq}liiication and deletion were >l.5 and <0.7. ' ' 



20 recurrent regions of DNA amplification have been mapped in 
breast cancer by CGH^ (9, 10).. However, these amplicons are often 
large and poorly defined, and their impact on gene expression remains 
unknown. 

We hypothesized that genome-wide identification of those gene 
expression changes- that , are attributable td 'und^lying gene copy 
number alterations would highlight transcripts that are. actively in- 
volved in the causation or maintenance of the malignant phenotype. 
To identify such transcripts, we qjpUed a cpinbioation of cDNA and 
CGH microarrays. to: (a) determine the global impEict that gene copy 
number variation plays in breast cancer development and progression; 
and (b) identify and characterize those genes whose mRNA e]q)res- 



- ^ The abbreviations used are: CGIi, comparative genomic hybtidizatioD; FISH, fluo- 
rescence in situ hybridization; RT-PCR, reverse transcription-PCR. 
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Fig. 2, 6enonie-widc copy number and expression analysis in the MCP-7 breast cancer cell Hne. A, chromosomal CGH analysis of MCF-7. The copy number ratio profile {blue 
line) across the entire genome from Ip telomere to Xq telomere is shown along with ±1 SD {orange iines). The black horixontai line indicates a ratio of 1.0; rtd.line, a ratio of 0,8; 
and green line, a ratio oif 1:2. genome-wide copy number analysis in MCF-7 by CGH on cDNA microarray. The copy number ratios were plotted as a luiction of Ae position 
of the cDNA cipncs along the human genome. In B, individual data points arc connected with a line, and a moving median of 10 adjacent clones is shown. J^ft/ horizontal line, the 
copy number ratio of 1.0. In Q mdividual data points are labeled by color coding according to cDNA expression ratios. The bright red dots faidicate the upper 2%. and dark red dots, 
the next 5H of the expression ratios in MCF-7 celU (overexptessed genes); bright green dots indicate the lowpt 2%, and dark green dots, the next 5% of the expression ratios 
(underexpicssed genes); the rest of the observations are shown with black crosses. The chromosome numbers arti shown at die bottom of die figure, and chromosome boundaries are 
indicated with a dicuAerf /Ine. 



sibn is most significantly associated with ampHftcation of the corre- 
sponding genomic'<template. 

MATERIALS AND MJ^THODS 

Breast.Cahcer Ct^H . Lines. Fourteen breast cancer cell lines (BT-20, BT- 
474, HCC1428, H5578t; MCF7, MbA-361, MDAt436. MDA-453, MDA-468, 
SKBR-3. T-47D. UACC8i2, ZR-75-.1, and ZR-75-30) wei« obtained from the 
American Type Culture Collection (Manassas; VA). Cells were grown under 
recommended culture conditions. Genomic DNA and mRNA were isohued 
using standard protocols. 

Copy Number and Expression Analyses by «DNA Microarrays. The 
pieparadon and printii^ of the 13,824 cDNA clones on glass slides were 
performed as described (1 1-13). Of these clones, 244 represented uncharac- 
terized expressed sequence tags, and Ae remamder corresponded to Imown' 
genes. CGH experiments on eDNA mjcroarrays were done as described (14, 
15). Briefly, 20 /xg of genoinic DNA fiom breast cancer cell lines and nonnal 
human WBQs were digested for 14-18 h with Abd Bnd Rsal (Life Technol- 
ogies, Inc., Rockville, MD) and purified by phenol/chloroform extraction. Six 
fig of digested cell line DNAs were labeled widi Cy3-dUTP (Ameisham 
Pharmacia) and normal DNA with CyS-KlUTP (Amersham Pharmacia) using 
the Bioprime Labeling kit (Life Technologies, inc.). Hybridization (14, 15) and 
posthybridizatibn washes (13) were done as described. For the expression 
analyses, a standard reference (Universal Human Reference RNA; Stratagene, 
La Jolla, CA) was used in all experiments. Forty ftg of reference RNA were 
labeled with Cy3-dUTP and 3.5 fig of test. mRNA with Cy5-dUTP, and the 
labeled cDNAs were hybridized on microarrays as described (13, 15). For both 
microarray analyses, a laser confocal scanner (Agilent Technologies^ Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DEARRAY software (16). After background subtraction, 
avenge intensities at each clone in die test hybridization were divided by the 
average intensity of the corresponding clone in the conti:ol hybridization. For 
the copy number analysis, the ratios were nonnalized on - the basis of the 
distribution of ratios of all targets on the array and for the expression analysis 
on the basis oifSS housekeeping genes, which were spotted four times onto the 
array. Low quality measurements {te,. copy number data with mean reference 
intensity <100 fluorescent units, and e7q}ression data with both test, and 
reference intensity <100 fluorescent units and/or. widi fipot size <50 tmits) 



w«e excluded fiom the analysis and were treated, as missing values. The 
distributions of fluorescence ratios were used, to defme cutpomts for increased/ 
decreased copy number. . Genes with CGH ratio >1.43 (representing the upper 
5% of the CGH ratios across all experiments) were considered to be amplified, 
and genes with ratio <p:73 (representing the lower $%) wore considered to be 
deleted. 

Statistical Analysis of CGH and cDNA Microarray Data. To evaluate 
the infiuehce of copy number alterations on gene expression, we. applied die 
following statistical approach. CGH and cDNA cahbrated intensity ratios were 
log-transformed and nonnalized using median centering of the values in eads 
cell line. Furthermore, cDNA ratios for each gene across all 14 cell luies were 
median centered. For each gene, the CGH data were represented by a vector 
that was labeled 1 for amplification (ratio, >1.43) and 0 for no amplification. 
Amplification was correlated with gene expression using the signaUo-ikoise 
statistics (1). We calculated a weight, (fx each gene as^oUows: 



where ntgi, o-g^ and o-^ denote the means and SDs for the expressioii 
levels for amplified and nonamplified cell lines, ^respectively. To assess the 
statistical significance of each weighs we performed 10,000 random permu- 
tations of the label vector. The probability that a gene had a larger or equid 
weight by random permutation than die original weight was . denoted by a A 
low a (<0.05) indicates a strong association between getie expression and 
amplification. 

Genomic Localization, of cpNA Clones and Amplicon Mapping. Each 
cDNA clone on the microarray was assigned to a Unigene cluster using the 
Unigene Build 141.^ A database of genomic sequence alignment information 
for mRNA sequences was created fiom the August 2001 freeze of the Uni- 
versity of California Santa Cruz's GoldenPaUi database.^ The chromosome and 
bp positions for each cDNA clone were then retrieved by relating these data 
sets. Amplicons were defined as a CGH copy number ratio >2.0 in at least two 
adjacent clones in two or more cell lines or a CGH ratio >2;0 in at least tfaiee 
adjacent clones in a single cell line. The amplicon start and end positions were 



^ Internet address: ht4>://i6sendij]hgri4uh.gov/taacn)8fxi^/^^ 
^ Internet address: www.geiKmie.nc8c.edu. ' 
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Table 1 Summary of independent amplicons in J4 breast cancer cell Une$ by 

CGH microarrc^ 



Location 


Start (Mb) 


End (Mb) 


Size (Mb) 




J J4* /7 


1^9 OA ' 


.11.2 




1 7-1 07 

1 / J<7« 


1 / / J£3 


1. 1 
3^ 




170 78 




A "X 




'71 OA 




2.7 


in\1 l-.7n1l 0 


« A7 


00.93 


S3 






130.96 


5.2 






140.00 


0.7 


iwyi 1 i— Art? Ill 


00.43 


92.46 


6.0 






103.05 


4.6 




I'M 9B 


142.15 


123 




.131.21 


152.16 


. .1.0 




36.65 


.39.25. 


0.6 




77 J 5 


8U8 


--4.2 • 




* 86.70 


87.62 


0.9 


17ql 1 


29 30 




1.0 


• J7ql2-q21.2 


39.79 


. 42.80 


3.0 


17q2IJ2-<j2l.33 


52.47 ■ 


55.80 • 


. 33 


17q22-<i23.3 


63.81 


69.70 


5,9 


I7<j23J-(i24.3 


69.93 


74.99 


5.1 


19ql3 


40.63 


41.40 


0.8 


20qll.22 


.34.59 


35.85 


1.3 


20ql3.12 


44.00 


45.62 


1.6 


20q]3.12^13.13 


46.45 


49.43 


3.0 




51.32 


59.12 


. 7.8 



extended to include neighboring nonamplified clones (ratio, <1.5). The am- 
pHcon size determination was partially dependent on local clone density. 

FISH. Dual-color interphase FISH to breast cancer cell lines Was done as 
described (17). Bacterial artificial chromosome clone RPll-36iK8 was la- 
beled with SpectrumOrange (Vysis, Downer? CJrove, IL), and Specttum- 
Orangcrlabeled probe for EGFK was obtained froin Vysis. SpectrumGr^n- 
labeled chromosome 7. and 17 centromere probes (Vysis) were used as a 
reference. A tissue microarray containing 612 formalin-fixed, parafGn-embed- 
ded primary breast cancers (17) was applied in FISH analyses as described 
. (18). The use of tfaes.e specimens was approved by the Ethics Conmiittee of the 
University of Basel and by the NIH. Specimens containing a 2-fold or higher 
increase m the number of .test probe signals, as compared with corresponding 
centromere, signals, in at least 10% of the tumor cells were considered to be 
amplified.. Survival analysis was performed using the Kaplan-Meier method 
and the log-rank test 

RT-PCR. The H0XB7 expression level wals .determined relative to 
GAPDH, Reverse transcription and PGR amplification were performed using. 
Acpess RT-PCR System (Promega Corp., Madison, WI) with 10 ng of mRNA 
as a template. H0XB7 primers were 5'-OA(3CAGAGGGA(rrc<3GACTT-3' 
and 5'^GTCAGGTAGCGArrcrrAO-3'. 

RESULTS . " 

Global Effect of Copy Number, on Gene Expression. 13,824 
arrayed cDNA clones were applied for analysis of gene expression 
and gene copy number (CGH microarrays) in 14 breast cancer cell 
lines. The results illustrate a considerable influence of copy number 
on gene expression patterns. Up to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (i.e., belonged to . 
the global upper 7% of expression ratios), compared with only 6% for 
genes with normal copy number leyels (Fig. lA). Conversely, 10.5% 
of the transcripts with high-level expression (cDNA ratio, >10) 
showed, increased copy number (Fig. 1^). Low-level copy number 
increases and decreases were also associated, with similar, although 
less dramatic, outcomes on gene expression (Fig. 1). 

Identification of pistiiict Breast Cancer Amplicons. Baise-pair 
locations obtained for 1 1,994 cDNAs (86.8%) weie used io plot copy 
number changes as a function of genomic position (Fig. 2, Supple- 
ment Fig. A). The average spacing of clones throughout the genome 
was 267 kbi This high-resolution mapping identified 24 independent 
breast cancer amplicons. spanning firom ^2 to-12 Mb of DNA (Table 
1). Several amplification sites detected previously by chrbmosomai 



CGH were validated, wifli lq21, 17ql2-q21.2i 17q22^q23, 20ql3.1, 
and 20ql3.2 regions being most commonly an^lified. Furthermoie, 
the boundaries of these amplicons were precisely delineated. In ad- 
dition, novel amplicons ^were identified at.9pl3 (38.65-39.25 Mb), 
and 17q2 1.3 (52.47-55,80 Mb). :* 
Direct Identification of Putatiye Amplification Target Genes. 
The cDNA/CGH microarray technique enables the direct correlii- 
tion of copy number aind expression data on a gene-by-gene basis 
throughout the genome. We dii:ectly annotated high-resolution 
CGH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified genes in the MCF-7 breast cancer 
cell line at lpl3, 17q22->q23, and 20ql3 were highly overex- 
pressed. A view of chromosome 7 in the ^IDA*468 cell line 
implicates EGFR as the. most highly overexpressed and anipliified 
gene at 7pll--pl2 (Fig.. 3^). In BT-474, the two known amplicons 
at 17ql2..and 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 3B). In addition, several genes, including the 
homeobox genes H0XB2 and HdXB7^ were highly amplified in a 
previously undescribed independent amplicon at. 17q2 1.3. HOXB7 
was systematically amplified (as validated by FISH, Fig. 3B, inset) 
as well as overexpressed (as verified by RT-PCR, data not shown) 
in BT-474, UAClCSU, and ZR-75-30 cells. Furthermore, this novel. 
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Fig. 3. Annotation of gene^ expression data on CGH microanay profiles. 4, genes in the 
7pl ]-f>12 einplicon u the MDA-468 cell Use are highly expressed (retf dots) and tnchide 
the EGFR oncogene. several genes in the I7ql2,,17q2l.3^ and 17q23 amplicons in the 
BT-474 breast cancer cell line are highly overexpressed (red) and inchide the H0XB7 
gene. The data labels and color coding are as indicated for Fig. 2C bisets show 
chtomosopial ' COH profiles ibr the eortesponding chromosomes and validation of the 
increased copy number by interphase FISH nsiog BGFJl (mO and chroniosoaie 7 
centromere probe (green) to MDA-468. (4) and /r(5jlGS7-<pecific' probe {rtd^ and chiD> 
mosome 17 centromere (grven) to BT-474 cells (iQ. 
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Fig. 4. List of 50 genes with « statistically 
sifiPlficant coireUtioa' (« vilae <0.0^ bdwten 
gene copy number and gene expiesslon. Name 
chn»nosonal locaiioa, and (he a value for each 
.gefifl we bdicated. The igenes have been onleted 
• a«coidmg 10 their posiaon {n the eeaMoc The col^ 
fuaps on the right iUusoaie the copy .number and 
expression tatio patients in the 14 cell linei. The 
key to (he color code is shown at the bottom of the 
graph. Gr<iy sqitarcs, misshig valucs.7he complete 
list of ZTO geoes Is shown tn supplemental Fig, B. 
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amplification was validated to be present in 10,2% of 363 primary 
breast caacers by FISH to a tissue microarray and was associated 
with poor progttosis of the patients (P = OiOOl). 

Statistical IdcnttHcatlon and Cbaracteiization of 270 Highly 
Expressed Genes in AmpUcons. Statistical comparison of expres- 
sion levels of all genes as a function of gene amplification identified 
270 genes whose expression was significantly influenced by copy 
number across alt 14 cell lines (Fig. 4, Supplemental Fig, B). Accord^ 
ing tn the gene ontology data.« 91 of the 270 genes represented 
hypothetical ptxjteins or genes with no functional annotation, whereas 
179 had associated functional inforiTiation available. Of Aesc, 151 
(84%) are implicated in apoptosis, cell proUferaUon, signal transduc- 
tion, and transcription, whereas 28 (16%) had functional ahnotations 
that could not be directly linked with cancer. . 



* Itticmct ackbest: http://www.geneoat6togy.oig^. 



DISCUSSION 

The importance of recurrent gene and chromosome copy number 
changes in the development and progression of solid tumore has been 
characterized in >10O0 publications applying CQH* (9, 10), as well 
as in a large mmiberof other molecular cytogenetic, cytogenetic, and 
molecular genetic rtudics. The efffects of these somatic genetic 
changes on gene expression levels have remainisd largely unknown, 
although a few studies have explored gene expression changes occur- 
ring in specific. amplicons (15. 19--21). Here, we applied genome- 
wide cDNA microarrays to identify transcripts whose expression 
changes were attributable to underlying gene copy number alterations 
in breast cancer. 

The 0 vera] 1 impact of copy number on gene expression patterns was 
substantial with the most dramatic effects seen \n the case of high- 



^ Ititemet address; lttt|i7/www.ncbi.iihnjilb.gov^mfez. 
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level cq>y nuinber increase. Low-level copy number gains and losses 
also had a signiiicasit influence on expression levels of genes in die 
regions affected, but diese effects were more subtle on a gene-by-gene 
basis dian diose of high-level amplifications. However, the impact of 
low-level gains on the dysregulation of gene expression patterns in 
cancer may be equally important if not more in^>ortant than that of 
high-level amplifications. Aneuploidy and low-level jgains and losses 
of chromosomal arms represent the most common types of genetic 
alterations in breast and other cancers and, therefore, have an influ- 
ence on many genes. Our results in breast cancer extend the recent 
studies on the impact of aneuploidy on global gene expression pat- 
terns in yeast cells, acute myeloid leukemia, and a prostate cancer 
model system (22-24)!^ 

The CGti micrparray analysis identified 24 ind^endent breast 
cancer amplicons. We defined the precise boundaries for many am- 
plieons detected previously by chromosomal CGH (9, 10, 25, 26) arid 
also discovered novel amplicons that had not been detected previ- 
ously, presumably because of their small size (only 1-2 Mb) or close 
proximity to other larger amplicons. One of these novel amplicons 
involved the homeobox gene region at 17q21.3 and led to the over- 
expression of the H0XB7 and HOXB2 genes. The homeodomain 
transcription factors are' known to be key regulators of embryonic 
development and.have been occasionally reported to undergo aberrant 
expressipn in cancer (27, 28). H0XB7 trarisfection induced cell |m>- 
lifersldon in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicity and angiogenesis in breast cancer (29-32), The pres- 
ent results imply that gene amplification may be a prominent medh- 
amsm for overexpressing HOXB7 in breast cancer and sugge^ that 
/rOAB7. contributes to tumor progression and Confers an aggressive 
disease phenotype in breast cancer. This view is supported by our 
finding of amplificatibn oi H0XB7 in 10% of 363 primary breast 
cancers, as well as an association of amplification with poor prognosis 
of the patients. 

We carried out a systematic search to identify genes whose 
expression levels across all 14 cell lines were attributable to 
amplification status. Statistical analysis revealed 270 such genes 
(representing ~2% of all genes on the array), including not only 
previously described amplified genes, such as MYC, 
EGFR, ribosomal protein s6 kinase, and AIB3, but also numerous 
novel genes such asNRAS^relatedgene (lpl3), syndecan-2 (8q22), 
9nd bone morphogenic protein (20ql3.1), whose activation by 
amplificaticv) n^ay similarly promote breast cancer progression. 
Most of the 270. gencS have not been implicated previously in 
breast cancer development and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is intrigtiing that 84% of the genes with associated 
functional information were implicated in apoptosis, cell prolifer- 
ation, signal transduction, transcription, or other cellular processes 
that could directly imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological insights to breast cancer progression and might lead to 
the development of novel therapeutic strategies. 

In summaiy, we. demonstrate application of cDNA microarrays 
to the analysis of both copy niunbet and expression levels-of over 
12,000 transcripts throughout the breast cancer genome, roughly 
once every 261 kb. This analysis provided: {a) evidence of a 
prominent global influesnce of copy number changes on gene 
expression levels; {b) a high-resolution map of 24 independent 
aniplicons in breast cancer; and (c) identification of a set of 270 
genes, the overexpression of which was statistically attributable to 
gene amplification; Characterization of a novel amplicon at 
17q21.3 implicated amplification and overexpression . of the 
HOXB7 gene in breast cancer, including a clinical association 



between H0XB7 ainplificatton and poor patient prognosis. Overall, 
our results illustrate how the jdentificatioh of genes activated by 
gene amplification provides a powerful approach to highlight 
genes with an important role in cancer as well jas to prioritize and 
validate putative targets for therapy development 
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